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in 

C^ Using a nonparametric function estimation methodology, we present a comparative 

t^^ analysis of the WMAP 1-, 3-, 5-, and 7-year data releases for the CMB angular power 

,__! spectrum with respect to the following key questions: (a) How well is the power spec- 

^~~* trum determined by the data alone? (b) How well is the ACDM model supported by 

►>■ a model-independent, data-driven analysis? (c) What are the realistic uncertainties 

S^ on peak/dip locations and heights? Our results show that the height of the power 

spectrum is well determined by data alone for multipole / approximately less than 546 
(1-year), 667 (3-year), 804 (5-year), and 842 (7- year data). We show that parametric fits 
based on the ACDM model are remarkably close to our nonparametric fits in /-regions 
where data are sufficiently precise. In contrast, the power spectrum for an HACDM 
model gets progressively pushed away from our nonparametric fit as data quality im- 
proves with successive data realizations, suggesting incompatibility of this particular 
cosmological model with respect to the WMAP data sets. We present uncertainties on 
peak/dip locations and heights at the 95% {2a) level of confidence, and show how these 
uncertainties translate into hyperbolic "bands" on the acoustic scale (Ia) and peak shift 
i4>m) parameters. Based on the confidence set for the 7-year data, we argue that the 



low-l up-turn in the CMB power spectrum cannot be ruled out at any confidence level 
in excess of about 10% (~ 0.12a). Additional outcomes of this work are a numerical 
formulation for minimization of a noise-weighted risk function subject to monotonicity 
constraints, a prescription for obtaining nonparametric fits that are closer to cosmolog- 
ical expectations on smoothness, and a method for sampling cosmologically meaningful 
power spectrum variations from the confidence set of a nonparametric fit. 

Subject headings: cosmic background radiation — cosmological parameters — Methods: 
data analysis — Methods: statistical 



1. Introduction 

The angular power spectrum of cosmic microwave background (CMB) temperature fluctuations 
is a measurable physical quantity that depends sensitively on the physics of the early universe. In 
particular, the shape of the angular power spectrum and the locations and heights of its peaks 
relate directly to parameters in the underlying cosmological models. As such, it has been used 
extensively as an acid test of the relative merit of competing cosmological models, and as a rich 
source of information about cosmological parameters themselves. 

Traditionally, and almost exclusively, cosmologists have resorted to model-based parametric 
statistical methods for estimating the CMB angular power spectrum from data. Parametric regres- 
sion methods require the functional form of the unknown regression function / to be pre-specified. 
The adjustable parameters in /, finite in number that is independent of the data size, are usually 
estimated by maximizing an appropriate likelihood function or a posterior distribution. In the 
cosmological context, it is conventional to employ parametric models that attempt to capture the 
essential physics of the problem via the pre-specified functional form, and any pre-existing knowl- 
edge about parameters is incorporated in the estimation process via appropriate prior distributions. 

Nonparametric function estimation methods, on the other hand, assume no specific functional 
form for /, except for mild regularity conditions such as smoothness assumptions, membership to a 
function space, etc. In this approach, the number of parameters that define the unknown regression 
function / is either infinite or grows proportionally with the data size, and the estimate /at of / 
is obtained by balancing bias and variance of /tv via optimal smoothing. Nonparametric methods 
are therefore model-independent, and are based on fewer and less restrictive assumptions about /. 
This, in turn, implies that any inferences about / made from nonparametric regression analyses tend 
to be more data-driven as opposed to being primarily model-driven. In other words, to a greater 
extent nonparametric function estimation methods tend to infer what is rather than what should 
be. As such, nonparametric regression methods can be meaningfully employed as sanity-enforcing 
mechanisms on parametric analyses, thereby making the conclusions drawn more conservative. For 
example, a feature seen in a parametric fit that survives in a nonparametric analysis is likely to 
be a real and robust feature of the data itself, and not merely an artifact that is seen because a 



parametric model expects it to be there. 



Alternative methodologies, such as the nonparametric methodology (Genovese et al. 2004 



Bryan et al. 2007) used in this work, may allow posing inferential questions that are difficult 



to address using conventional methods. For example, this particular nonparametric methodology 
allows validating model-based, parametric fits against the confidence set for the nonparametric fit to 
the same data, possibly to rule them out as candidates for the true but unknown regression function. 
This methodology also has the formal advantage of being able to provide realistic uncertainties on 
any number of features of the angular power spectrum that are simultaneously valid at the same 
level of confidence. Such desirable features are arguably lacking in most mainstream methodologies, 
including Bayesian, that are commonly used in cosmology. (An incisive, insightful, and discerning 
discussion about the relative merits of this methodology over statistical methods conventionally 
employed in cosmology can be found in the two references cited above, and is best read in the 
original.) 



The four CMB angular power spectrum data sets (Hinshaw et al.|2003b 2007 Nolta et al.|2009 



Larson et al. 2011) released by the Wilkinson Microwave Anisotropy Probe (WMAP) (Bennett 



et al. 2003) mission, representing cumulative observations at the end of 1, 3, 5, and 7 years of 



operation, present a unique opportunity for statistical analysis. For example, till date these are 
claimed to represent the most precise and extensive full-sky CMB measurements ever made (only 



to be superseded by the Planck mission (Tauber et al. 2010)). The four data sets represent the 
evolution of data over a period of about a decade, thereby making it possible to assess progressive 
and possibly systematic resolution of features of the spectrum. From a statistical perspective, each 
of these moderately large data sets (minimum of 899 data points for the WMAP 1-year release) is 
not only heteroskedastic, but also has substantial correlations that arise in typical data pipelines 
( Tegmark|[T997| [Hinshaw et al.|[2003a| |2009t [Jarosik et al.||2007t [Jarosik et al.||2011 ). 

In this paper, we present a comparative nonparametric analysis of the WMAP 1-, 3-, 5-, and 
7- year data sets for the angular power spectrum. Specifically, for each data realization, we address 
the following three key questions: (a) How well is the angular power spectrum determined by the 
data alone? (b) How well is the ACDM model supported by a model- independent, nonparametric, 
data-driven analysis? (c) What are the realistic uncertainties on peak/dip locations and heights? 



Our analysis is based on a nonparametric function estimation methodology (Genovese et al. 2004 



Bryan et al. 2007), which is discussed in Sec. ^together with our extensions; i.e., a numerical 
formulation for minimization of a inverse-noise-weighted risk function subject to monotonicity con- 
straints, a prescription for obtaining nonparametric fits that are closer to cosmological expectations 
on smoothness, and a method for sampling cosmologically meaningful power spectrum variations 
from the confidence set of a nonparametric fit. Results are presented in Sec. [3j and conclusion in 
Seed 



2. Methodology 



The nonparametric function estimation methodology ( Genovese et al.[2004 Bryan et al.||2007 ) 
used in this work is an extension of the REACT methodology (Beran 2000a|b), which is in turn 



founded on rigorous formal results in (Beran & Diimbgen 1998 Beran 1996). Two early papers 



(The Pittsburgh Institute for Computational Astrostatistics 2003: Miller et al. 2002) used this 



methodology to analyze, under the assumption of homoskedasticity, a pre-WMAP data set that 
combined BOOMERanG, MAXIMA, and DASI data sets. A generalization of this formalism 
for dealing with heteroskedasticity via an inverse-noise-weighted loss function was developed in 
( Genovese et al.||2004 ). More recently, using the WMAP 1-year data, ( Bryan et al.|2007 ) illustrated 
how confidence intervals on cosmological parameters and boundaries in the cosmological parameter 
space can be inferred from the confidence set for a nonparametric power spectrum fit. 



In this section, we first present an operational outline of this methodology (Sec. 2.1 and 2.2). 



This outline is entirely based on (Genovese et al. 2004: Bryan et al.||2007 ) and is included here for 
completeness. A pedagogic treatment of the central ideas and a simpler variant of the problem 
can be found in (Wasserman 2006). Specific citations to other sources are provided wherever 
appropriate. 

Our own numerical formulation of the monotone risk minimization problem, where the risk 



function is derived from an inverse-noise- weighted loss function, is presented in Sec. 2.3, In Sec. 2.4 



we describe a systematic way of obtaining a monotone fit with smoothness that meets cosmological 



expectations. In Sec. 2.5 we describe our method for probing the confidence set; this is the basis 



for the results presented in Sec. 3.1 and 3.3 



2.1. The nonparametric fit 

We are given CMB angular power spectrum data of the form 

Yi = Ci + ei 



(1) 



consisting of N data points observed over multipole index range Imin < ^ < Imax- Here, C; stands for 
the value of the true but unknown power spectrum at multipole index /. The noise variables e/ are 
assumed to have a mean-0 normal distribution with known covariance matrix S. In practice, any 
reasonable estimate/approximation S of this covariance matrix, such as an inverse Fisher matrix 
for a pilot parametric fit, can be used in place of S. 

This nonparametric regression method is based on expanding the unknown regression function 
/, assumed to belong to an appropriate L2 function space, in a complete orthonormal basis {0j(x)}, 

as 

00 

j=0 



A basis that has proven useful in the CMB angular power spectrum context is the cosine basis 
defined over < x < 1: 

(pj{x) = < . (2) 

\V2cos{J7rx){j = l,2,...) 

Assuming that / is sufficiently smooth, we take 

N-l 

f{x) « ^ I3j4>j{x), 

3=0 

and estimate it as 

N-l 

E 

j=0 



fN{x) = V Pj4>j{x). (3) 



While the method is asymptotically (i.e., as the data size N — )• oo) basis- independent, choice of the 
basis may matter in any finite- A^ application; see (Beran 2000a|b) for a detailed discussion. This 



basis satisfies a discrete orthogonality property when the data Yi are available over an equispaced 
grid {xi = {2i + 1)/2A, < i < A — 1} consisting of zeros of (J)n{x). In the CMB context, any 
contiguous range of A integer multipole indices Imin ^ I ^ Imax can be formally mapped onto this 
equispaced grid, hence we will not make any categorical distinction between data index i and the 
corresponding multipole index /. 

The true angular power spectrum Ci = f{xi) is estimated as Ci = fN{xi) via coefficient 
estimates /3j, which are estimated as 

Pj = \jZ,, (4) 



j=0 



where 



Here, U is the orthonormal matrix with elements U-ij = 4>j{xi)/yN and Y = (Yq) . . • ) ^-i) . 
The task of obtaining coefficient estimates /3 = (/3o, . . . , Pn-i)^ , and thereby the fit fN{xi), is now 
relegated to determining the shrinkage parameters Xj. Assuming smoothness for / (which implies 
a rapid decay of the true coefficients f3j with j), the shrinkage parameters Xj are constrained to be 
monotonically decreasing 

1 > Ao > Ai > . . . > Aat-i > (Monotone shrinkage). (6) 

A special, discrete subset of the monotone shrinkage defined above is the nested subset selection 
(NSS) shrinkage, defined as 

fl for 0< j < J , 

A,- = <^ ~ NSS shrinkage . (7 

lo for J<j< A 



Either shrinkage type results in selective damping of high-frequency harmonics in the data Yi, 
which results in smoothing of the fit /at. A useful interpretation of shrinkage parameter Xj is that 
it represents the effective degree of freedom for the j th coefficient estimate f3j . One can thus define 
the effective degrees of freedom (EDoF) for the entire fit /tv as 

N-l 

EDoF(A) = J2^r (8) 

j=0 

This definition follows from the fact that for a linear smoother, EDoF of a fit is formally defined as 
tr('H), where Ti is the matrix that connects the fitted values Y to the data YasY = T-LY. For the 
present nonparametric regression method, Ti = UDU , where U is the orthonormal basis matrix 
(Eq. |5|, and D = diag(Ao, . . . , Xn-i). This implies tr(?^) = tr(D) = YI^Sq^ >^i- 

In the present formalism, the discrepancy between the (unknown) regression function / and 
its estimator /tv is measured by the inverse- noise- weighted squared loss function L(X), defined as 

L(A) = / ( ■"■"' /"^-^^ I dx. 




Here, C7^(a;) is the (known) variance of the data Y at x, which accounts for the heteroskedasticity 
of the data Y. The loss L is considered a function of the vector of shrinkage parameters A = 
(Ao, . . • , Aat-i)-^ that determine the regression estimator /at via Eq. |4J Risk R{X), which is the 
expected value of -^(A), can be written as a sum of two non-negative terms; namely. 



-1 ff{x)-E(fM{x))\^ rl 

R{X) = / -\ ^ dx+ E 



aix] 







/jv(x)-E /^(x) 



o[x] 



dx. 



These two terms represent, respectively, the integrated squared bias and the integrated variance 
of fi^{x), both weighed by 1/(T^(x). Optimal smoothing is achieved, in principle, by minimizing 
R{X) with respect to A. Generally speaking, too little smoothing leads to a fit /at with low bias 
and high variance, and too much smoothing yields a fit with high bias and low variance. Minimal 
risk or optimal smoothing therefore can be thought of as a balance between the bias of f^ and its 
variance. 

The risk function i?(A), unfortunately, depends on the unknown regression function /, and 
therefore needs to be estimated. A particular estimator of this risk, which is of the SURE (Stein's 



unbiased risk estimator (Stein 1981)) kind, takes the following form: 

-R(A) = Z'^DWDZ + ti{DWDB) - ti{DWDB) , (9) 

where Z = {Zq, . . . , Zj^^if^ , D = diag(Ao, • • • , Ajv_i), D = In — D, B = U'^T,U/N is the covariance 
of Z, and In is the N x N identity matrix. The positive (semi)definite weight matrix W is defined 
as 

Wjk = Y,^JklWl, (10) 



N-l 



where wi is the /th coefficient in the expansion (l/a (x)) ~ X],=o 
(Eq.[2l), 



A 



■jki 



(t)j{x)(t)k{x)(t)i{x) dx 



w 



j'yj 



[x) and, for the cosine basis 



1, 
0, 

SjkSoi + SjiSok + SkiSoj, 

■^i^j+k + ^l,\j-k\) 






0} 
0} 
0} 
0} 



3, 
2, 

1, 
0. 



We denote the optimal shrinkage obtained by minimizing risk R{X) by A. The best NSS 
shrinkage \nss is obtained simply by evaluating risk R{X) for each of the (A'^ + 1) NSS shrinkage 
vectors and choosing the one with the least risk. Monotone shrinkage usually results in a lower risk 
than the NSS shrinkage because of the greater freedom available in the monotone set of shrinkages. 
We will discuss risk minimization subject to monotonicity constraints (Eq. IgI) in Sec. 2.3 

Fig.fllillustrates the contrasting behavior of the nonparametric risk (red curve) and the WMAP 



1-year likelihood function (green curve) (Verde et al. 2003) for the WMAP 1-year data (Hinshaw 



et al. 2003b), as a function of the EDoF of all NSS fits. This figure is motivated by the fact that 



cosmologists, by and large, are better-acquainted with parametric likelihood-based methods. Each 
integer value on the horizontal axis represents one NSS fit, from the zero function at EDoF = 
to the fit that simply interpolates through the data (EDoF = A^). Optimal smoothing for NSS 
shrinkage occurs at EDoF = 12 where the nonparametric risk function attains its minimum over 
the NSS set of fits. Likelihood function, on the other hand, keeps on improving with the EDoF 
indefinitely. 



2.2. Confidence set around the fit 

Conventional regression methods provide a confidence band around the fit that quantifies the 
uncertainty in the fit. In contrast, this nonparametric methodology quantifies the uncertainty 
surrounding the nonparametric fit in the form of an elegant construct, namely, a (1 — a) confidence 
set at a pre-specified confidence level < (1 — a) < 1. The (1 — q) confidence set for the coefficient 
vector /3 is defined as 

VN,a = {/?:(/? - dfW{(3 -d)<rl}, (U) 

which is centered at the vector of estimated coefficients /3, and the confidence radius r^ is given by 



TZn 



+ RiX). 

Here, Za is the upper a quantile of the standard normal distribution, and 

t'^/N = 2tr:{ABAB) + Z'^QZ - tr{QB), 



(12) 



(13) 



with Q = 4{ABA + WDBDW - 2ABDW) and A = DW + WD - W. In practice, the risk 
estimator (Eq. ^ and/or r^ (Eq. 13 ) may turn out to be negative for particular data/covariance 



matrix reahzations. In such cases, the squared confidence radius (Eq. 12) may be negative (or 
may not be for a = 0). For minimization purposes, the risk estimator (Eq. ^ is adequate and 



appropriate (Beran k, Diimbgen 1998). For confidence radius purposes, we suggest the following 



modifications to avoid the negativity problem: 

R+ = Z'^DWDZ + ia&:>i{Q,t^:{DWDB)-tiiDWDB)] 
tI/N = 2tr(AS^S) + max{0,Z^QZ-tr(QS)} 

T+Zo 



(14) 



' a+ 



max < 



N 



+ Ra 



At worst, this adjustment will make the confidence radius bigger, resulting into, e.g., wider con- 
fidence intervals, but more conservative inferences. Similar modifications have been suggested in 
dBeran fc Diimbgen|p98| |Beran||2000^ [Genovese et'ar]|2004D . 



The corresponding confidence set on the true regression function / is given by 

N-l 

13N,a = { fix) = Y^ Pj(t>j{x) : /3 G Vn,, 

j=0 



(15) 



The quadratic form of the inverse noise- weighted loss function and the fact that the weight matrix 
W is positive (semi) definite implies that both confidence sets V^^a and 13N,a are ellipsoidal in 
shape. For any functional T of the spectrum /, such as location or height of a peak or a dip, the 
(1 — a) confidence interval is defined as 



'-N,a 



min T{f), max T(/) 



(16) 



Moreover, prior information that the true regression function / belongs to a subset VN,a of the 
confidence set Pat^q (e.g., / has k peaks over the range of x-values represented in the data) can 
be incorporated in the analysis by replacing 'Dj\f^a with VN,a H V^^a- This methodology further 
provides the formal assurance that, asymptotically, 

1. B]\[,a (T^N.a) will contain the true spectrum / (true coefficient vector (3) with probability 
> (1 — a), and 

2. confidence intervals lN,a on any number of functionals T{f), computed from the confidence 
set BN,ai will be simultaneously valid at the same confidence level (1 — a), and that these will 
trap their corresponding true but unknown values with probability > (1 — q). 



2.3. Risk minimization subject to monotonicity constraints 



In this section, we show how the risk function R{X) (Eq. ^ can be minimized subject to the 
monotonicity constraints 1 > Aq > Ai > . . . > Aat-i > 0. The risk function corresponding to 



the unweighted loss function {W = In) has a simple weighted-sum-of-squares form, and can be 



minimized exactly and efficiently using the pooled adjacent violators (PAV) algorithm (Robertson 
et al. |1988 ). While the risk function corresponding to the inverse- noise- weighted loss function 
{W 7^ In) is still quadratic in A, it can no longer be expressed as a weighted sum-of-squares, and 
the PAV algorithm cannot be used to minimize it. 

It can be shown that, disregarding terms that do not depend on A, the risk function R{X) (Eq. 
[9]) can be written as 

^(A) = -X^HX - X^h, 

where Hjk = 2zjZkWjk, h = {H — V){1,1, . . . , 1)"^, and Vj]^ = 2WjkBkj. H and V are both man- 
ifestly symmetric. Positive (semi)definiteness of W implies that H is a. positive (semi) definite 
matrix, implying that -R(A) is a convex function. The system of linear inequality constraints 
1 > Ao > Al > . . . > Aat-i > implies that the constrained region (Fig. [2]) has a convex triangu- 
loidal shape determined by flat surfaces. The original risk minimization problem can therefore be 
formulated as the following equivalent convex quadratic minimization problem: 



where C is the {N 



Minimize 


R{X) = -X^HX-X^h 


subject to 


CX< (0,0,.. 


,of 


and 


< Ai < 1 for all i, 


1) X A^ matrix 
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An additional linear inequality or equality constraint of the form 

N-l N-l 



'^ Xi<q or y^ Xj 



i=Q 



i=0 



(17) 



(18) 



where q constrains the EDoF of the fit (0 < g < N), can easily be accommodated in this for- 
mulation. This reformulation of the monotone risk minimization problem makes it possible to use 
standard minimization methods (Pshenichny &: Danilin|1978 Powell|1985 Goldfarb Sz Idnani|1983 ) 



and computational tools ( Schittkowski 2007; Vanderbei 1999) for convex quadratic minimization 



problems subject to linear constraints and simple bounds. 
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2.4. Obtaining a smoother monotone fit that is closer to cosmological expectations 

To motivate the discussion in this section, consider the full-freedom monotone fit to the WMAP 
data sets obtained as above (green curves in Fig. [s] and |4]). By full-freedom monotone fit, we 
mean the fit that minimizes risk over the entire monotone-admissible region (Fig. p| without any 
restriction on the EDoF of the fit. This fit turns out to be quite wiggly especially at the high-/ 
end because of the high noise variance here. Such wiggliness implies presence of high-frequency 
components in the fit, which in turn implies a large number of EDoF in the fit. 

Without cosmological pre-conditioning (i.e., from a completely agnostic and data-driven per- 
spective) and when viewed in relation to the data, it is clear that this fit is not at all unreasonable, 
given the high noise levels at the high-/ end. However, all cosmological models suggest far smoother 
shapes for the angular power spectrum. In the context of the present methodology, one candidate 
for a smoother fit is the NSS fit (i.e., one that has the minimal risk over the set of (A^ -|- 1) NSS 



fits). Indeed, this possibility has been exploited, e.g., in (Genovese et al. 2004). However, the 
NSS fit (see the blue curve in Fig. ^ and H| may also turn out to be somewhat unsatisfactory with 
respect to cosmological expectations (and, some times, also with respect to trends refiected in the 
full-freedom monotone fit). This is primarily because of the limited freedom available in the NSS 
class. Notice again that the NSS fit is not entirely unreasonable from an agnostic viewpoint. 

The monotone set, on the other hand, offers the possibility of harnessing local minima in the 
risk function that are constrained to lie in appropriate "smoother" subsets of the full monotone 
set. This may be achieved in two distinct ways that may be combined for greater effect: 



1. By imposing one of the additional constraints (Eq. nm on the EDoF of the fit. Examples of 
such restricted- freedom monotone fits are the red curves in Fig. [3] and |4| 

2. By truncating the expansion (Eq. ^ to p number of coefficients {p < N) and then performing 
monotone risk minimization over this subset of the full monotone set. 

In practice, such smoother restricted-freedom monotone fit can be obtained by gradually reducing 
the value of g (Eq.|18[) starting from the EDoF of the NSS fit until all low-amplitude, high-frequency 
wiggles in the fit disappear. Generally, the resulting fit has a lower risk than the NSS fit with 
EDoF = q, and is manifestly consistent with trends captured by the full-freedom monotone fit. We 
find it useful to present (or consider) all three fits (NSS, full-freedom monotone, and restricted- 
freedom smoother monotone) together: This helps build a realistic picture about estimated trends 
in the data, and thereby about the shape of the underlying true spectrum. Like the NSS fit, 
the smoother restricted-freedom monotone fit will generally be more biased than the full-freedom 
monotone fit. This greater bias, however, is partially compensated for by a larger risk which results 



in a larger confidence radius value (Eq. 12), a larger confidence set, and therefore more conservative 
inferences. 



- 11 - 

2.5. Probing the confidence set for uncertainties on features of the fit 

In this paper, we need to probe the confidence set for a fit for two purposes: (a) for vahdating 



cosHiofogical models against a nonparametric fit (see Sec. 3.2). (b) for finding the uncertainties on 
specific features of the fit such as peak heights and locations, and Below we describe our method to 
scan and sample the confidence set for the latter. Our particular method for probing the confidence 
set for determining uncertainties on features of the fit is based on the following observations: 

1. The confidence set Vjy^a on the vector of coefficients (/3o, . . . ,(3n-i), by construction, is cen- 
tered at the vector of estimates (/3o, . . . , (3n-i)- 



2. The confidence interval defined in Eq. 16 requires locating extreme variations in any functional 
T of the power spectrum /; e.g., location or height of a peak. The largest possible variations 
in T will be located as far away from the center of the confidence set as possible, i.e., on its 
surface. 

3. Cosmologically meaningful and sufficiently smooth variations in the fitted spectrum Ci are 
most likely to be located in the projection of the full confidence set onto the lowest M 
dimensions. 

We therefore generate a uniform sample from the projection of the full confidence set surface 
onto the lowest d dimensions, where 2 < d < M, with M < 23. For convenience, we use the 
smoother restricted-freedom monotone fit for this purpose, with the justification that the confidence 
set corresponding to the full- freedom monotone fit happens to be nested inside that for this fit, for 
all four data realizations. The Cholesky factorization W = u u is used to transform the original 
confidence ellipsoid V^^a (whose principal axes may not be aligned with coordinate directions in 
the /3-space) into a sphere of the form {ijj : ||^ — ^|p < r^}, where ip = u/3 and 'ip = u/S. The 
surface of this ■0-sphere can be efficiently sampled with uniform density using a standard algorithm 



(Sec. 3. 4. I.E. 6 on p. 130 of (Knuth 1981)), and then transformed back into the /3-space, preserving 
uniformity of density because of the linearity of the transformation. From a sufficiently large sample 
of such variations of the power spectrum, we further selected those functions for which successive 
peaks and dips are separated by at least 50 multipole moments /. This cosmologically- motivated 
selection criterion ensures that (a) the sampled functions are sufficiently smooth, and (b) high- 
frequency wiggles are not counted as peaks or dips when estimating uncertainties on locations and 
heights of peaks and dips (see Sec. |3.3| ). Based on cosmological considerations, we restricted the 
search to functions with 3 peaks (WMAP 1-, 3-, and 5-year data) or 4 peaks (7-year data). The 
set of functions thus sampled is used to estimate uncertainties on specific features of the fit. As an 
aside, we note that the confidence set construct and the formal guarantees related to confidence 
intervals (Eq. 14) do not necessarily imply a uniform density over the confidence set; uniform 
sampling is used here as a convenient computational device for scanning the confidence set surface 
in an unbiased fashion. 
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Results and Discussion 



The four WMAP angular power spectrum data sets used in this work, ACDM parametric fits for 
the CMB angular power spectrum, and likelihood codes that produce their respective Fisher (inverse 
covariance) matrices are obtained from the WMAP data archive http : //lambda . gsf c . nasa . gov/. 
For all four data realizations, it turned out that (a) the weight matrix W (Eq. [10| ) is numerically 
positive definite, and (b) the confidence set for the full-freedom monotone fit is completely nested 
inside that for the smoother restricted- freedom monotone fit. It is worth noting that our nonpara- 
metric fits and confidence sets are not too sensitive to the details of the covariance matrix (this 



was also pointed out in (Bryan et al. 2007)). Most computations were done using the R statistical 



computing environment (R Development Core Team 2010). We used the QL codes ( Schittkowski 



2007) for the monotone risk minimization problem (Eq.|17[). Our nonparametric fits, obtained using 
the method outlined in Sec. 2.3, are shown in Fig. [3] and |4] for the WMAP 1-, 3-, 5-, and 7- year 
data sets. 



3.1. How^ well is the power spectrum determined by data alone? 

Considering that the noise in all four data sets is highly heteroskedastic and noise levels are 
especially high for large I, it would be useful to make an assessment of how such noise in the 
data affects local uncertainties in the fitted spectrum, and to quantify how well the angular power 
spectrum value at each / is determined by the data. 

To this end, we compute, for each data set, the approximate 95% confidence interval on each 
Ci using 5000 function variations from the confidence set as outlined in Sec. 



2.5 



The length of 

this vertical confidence interval at given /, divided by the absolute value of the fit, |C/|, provides an 
approximate indication of how well each C; is determined via the following interpretation: a value 
<C 1 indicates that the fit is well determined by the data, whereas values > 1 imply that the data 
contain little or no information about the height of the power spectrum. This approach, which is 



inspired by the boxcar probe approach of Genovese et al. (2004), has the practical advantage of 
not having adjustable parameters (i.e., the boxcar width) in the procedure. 

In Fig. [5j we plot this height, scaled by the value of the fit, as a function of the multipole 
index /, for all four data realizations. We see that the range of /-values over which the fit is well- 
determined expands consistently between 1-, 3- and 5-year data realizations, from / ~ 546 (1-year), 
to / Ri 667 (3-year), to / ?» 804 (5-year). On the other hand, while the /-range of the data expanded 
substantially between WMAP 5 and 7, the information contained in the data does not appear to 
have grown proportionately beyond / ~ 842 for the 7-year data. 
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3.2. How well is the ACDM model supported by a model-independent, 

nonparametric, data-driven analysis? 

In each of the four plots in Fig. |3] and |4j we have also included parametric fits based on the 



ACDM (Hinshaw et al. 2003b, 2007 Nolta et al. 2009; Larson et al. 2011) and HACDM (Pri- 



mack et al. 1995; Primack Sz Gross 2001) models (see figure caption for details). The parametric 



ACDM-based fits turn out to be quite close to the respective nonparametric fits wherever the 
data are precise. This is remarkable considering that our nonparametric method does not rely 
on any cosmologically-motivated prior information whatsoever. Moreover, the parametric ACDM- 
based fit appears to get closer to the respective nonparametric fit across the four WMAP data 
releases. The closeness of a parametric fit {Ci^.^, . . . ,Ci^^^) to the corresponding nonparametric 
fi^ (^'min' ■ • • ' C*/^^^) can be measured through the distance 



d{C, C) 



\ 



-1 f'lnax 

- y 

L — irnir. 



Ci-Ci 
en 



Using Eq. 12 , this distance can be further expressed as the smallest confidence level a beyond which 



the parametric fit is rejected as a candidate for the true spectrum. 

Table [T] lists distances of parametric fits based on the two cosmological models (ACDM and 
HACDM) from the nonparametric full-freedom monotone fit for the corresponding data realization 
(see caption for specific details and description). The progression of distance values between a 
parametric ACDM fit and the corresponding nonparametric fit clearly shows that the two are 
getting closer as the data become precise. In contrast to this, the angular power spectrum generated 
by the particular HACDM model considered, which is almost as strong a contender for the true 
power spectrum as the ACDM fit with respect to the 1-year confidence set, is progressively pushed 
away to the boundary of the 95% confidence set for the 7- year confidence set. Visually, this trend 
can be understood on the basis of the differences between the HACDM power spectrum and the 
nonparametric fit (e.g., differences in the heights of the first peak; see Fig.^andHl) that result into 
pushing this particular HACDM model out of the confidence set. Given the formal guarantees of 
this methodology (Sec. 2.2), the WMAP 7- year data thus rules out, at ~ 95% confidence level, the 
particular HACDM model considered. 



3.3. Uncertainties on locations and heights of peaks and dips 

We now consider the problem of determining uncertainties on the locations and heights of 
peaks and dips in the nonparametrically fitted spectrum. The motivation for this exercise comes 
from the fact that peak locations and heights contain valuable information about cosmological 



models and parameters (Doran &; Lilley 2002; Durrera et al. 2003). 



Following the prescription outlined in Sec. 2.5 we sampled a set of 5000 function variations 
from the confidence set for each data realization. Peak and dip locations and heights were recorded 
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for each peak and dip over this set of functions. This results into an empirical scatterplot that is 
indicative of the joint distribution of location and height for each peak or dip, under the assumption 
of uniform surface density on the confidence set T>N,a ■ 

Fig. [6] shows the results of probing the 95% confidence sets for uncertainties on peaks and 
dips, as outlined above, with 5000 acceptable function variations for each data realization. The box 
around a peak or a dip represents the largest horizontal and vertical variations in the scatter. In 



accordance with the confidence interval defined in Eq. 16 these form the 95% confidence intervals 
on the location and height of a peak/dip. Table [2] lists these confidence intervals together with 95% 
confidence intervals on peak height ratios. 

The following features of these results are worth pointing out. As is well-known, the first 
peak was very clearly resolved in the 1-year data itself. Our results are manifestly consistent with 
this observation, in the sense that its box is does not overlap with any other box. However, the 
uncertainty on the first peak does not shrink appreciably across the four data realizations. Further, 
our results clearly indicate that the second peak is resolved cleanly only in the 5-year data, whereas 
the third and fourth peaks are not resolved completely even in the 7-year data. 



3.4. Uncertainties on the acoustic scale (Ia) and peak shift {(pm) parameters 



Consider the following relationship (Hu et ar]|2001 Doran fc Lilley|[2002 ) between the location 



Im of the mth peak, the acoustic scale I a, and the shift parameter ^m: 

lm = lAim- (pm)- (19) 

Substituting the end-points of the 95% confidence interval for the mth peak location, this relation- 
ship results into a hyperbolic band of allowed values in the Ia — <Pm plane. Such bands, derived from 
95% confidence intervals on the first three peaks (Table ^, are shown in Fig. u\ Additional infor- 
mation from other sources is required to constrain these bands to physically meaningful regions in 
the Ia — (t>m plane. For example, if we assume I a = 300 ( Page et aL]|2003 ) then, based on the 7-year 



data, the 95% confidence intervals for </>„ wih be pi : (0.1600,0.3767), (/)2 : (0.0367,0.3600), 03 : 
(—0.2167, 0.7300). Conversely, additional constraints on prn could be used to generate a confidence 
interval on Ia- From a model-independent point of view, we note that the {lA,<Pm) bands for dif- 
ferent peaks m appear to overlap around (pm ~ and 200 ^ Ia ^ 400. We interpret this as a 
nonparametric revelation of the nearly harmonic structure of peaks in the CMB power spectrum. 



3.5. The low-Z up-turn from a nonparametric view^point 

Another interesting feature in Fig. [6] is the tiny but clearly observable scatter for the very first 
dip at the low-/ end. This scatter corresponds to extreme power spectrum variations that reside on 
the surface of the 95% confidence set and have an up-turn at low / values. In the ACDM cosmology. 
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such up-turn at the low-/ end is primarily the result of the integrated Sachs- Wolfe (ISW) effect, 
and is seen in all parametric ACDM fits in Fig. [3] and |4j It would therefore be interesting to 
see what could be said about the low-/ up-turn (and thereby about the ISW effect) based on the 
nonpar ametric confidence set. 

Notice that our nonparametric fits, which are at the center of their respective confidence sets, 
do not show a low-/ up-turn. However, the 7-year parametric ACDM fit, e.g., does show a clear 
up-turn at the low-/ end. This parametric fit is at a distance corresponding to confidence level 
of about 10% (~ 0.12a; see Table [I]) from our 7- year nonparametric full- freedom monotone fit. 
This means that the confidence set for the 7-year nonparametric fit contains spectra with a low-/ 
up-turn at most as far away as the 7- year parametric fit. We therefore conclude conservatively that 
the low-/ up-turn as a feature of the CMB angular power spectrum cannot be ruled out at any 
confidence level in excess of about 10% Actually, there are indications in our results (not shown) 
that such up-turned variations of the spectrum may be much closer to the center of the confidence 
set for the 7-year full-freedom monotone fit; this needs further investigation. 



4. Conclusion 

In this paper, we have presented a comparative nonparametric analysis of the WMAP 1-, 
3-, 5-, and 7-year data releases for the CMB angular power spectrum, using a nonparametric 



function estimation methodology (Genovese et al. 2004 Bryan et al. 2007). In the context of this 



methodology, we have also presented our own numerical formulation for minimization of the inverse- 
noise-weighted risk function subject to monotonicity constraints, and a prescription for obtaining 
monotone nonparametric fits that are closer to cosmological expectations on smoothness. For all 
data realizations, we have presented results pertaining to the following questions: (a) how well 
is the angular power spectrum determined by the data alone, (b) how well is the ACDM model 
supported by a model- independent, nonparametric, data-driven analysis, and (c) what are the 
realistic uncertainties on peak/dip locations and heights. 

The motivation for the analysis presented here was to explore what could be inferred about the 
CMB angular power spectrum in a model-independent, data-driven manner. On the other hand, 
the basic physics of the CMB is quite well established. It would therefore be useful to connect 
a nonparametric/model-independent analysis such as ours with the known physics of the CMB 
angular power spectrum. This is reserved for the future. 

To conclude, we have demonstrated in this paper the threefold utility of the nonparametric 
methodology used here for cosmological function estimation problems: as a method with sound 
formal guarantees, as a sanity-enforcing mechanism on parametric model-based analyses, and as a 
method that allows interesting inferential questions to be addressed and answered in a data-driven 
manner. 
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discussions covering aU of statistics. Our R codes for computing the nonparametric fit are based on 
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Fig. 1. — Nonparametric risk (red curve) and — 21og(likelihood) (blue curve) as functions of the 
EDoF (Eq. Isl) for the NSS fits to the WMAP 1-year data. This iUustrates the contrasting behavior 
of the two quantities. Optimal smoothing occurs at EDoF = 12 where the nonparametric risk 
attains its minimum over the NSS set of fits. Likelihood function, on the other hand, keeps on 
improving with the EDoF indefinitely, log (likelihood) values were computed using the WMAP 
1-year likelihood code ( Verde et al.||2003 ). The blue (left) and red (right) vertical scales on the plot 
are associated with the nonparametric risk and the —2 log (likelihood) respectively. 
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Fig. 2. — Trianguloid-shaped admissible regions, marked by red lines, for the monotonicity con- 
straint 1 > Ao > Ai > . . . > Xn-1 > for at = 2 (left) and 3 (right). The {N + 1) vertices of the 
trianguloid correspond to the {N + 1) NSS fits, with the origin corresponding to the zero function, 
and the vertex (1,1,...,!) corresponding to the function that exactly interpolates through the 



data. Surfaces with constant value p of EDoF (Eq. ^ are hyperplanes of the form ^ 
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Fig. 3. — Nonparametric fits for WMAP 1- (top) and 3-year (bottom) data sets, x- and y-ranges 
are identical across pfots. Green: full-freedom monotone fit (EDoF ~ 80.2,76.5 respectively); 
blue: NSS fit (EDoF = 12, 10 respectively); red: restricted-freedom monotone fit (EDoF ss 9.4,9.5 
respectively); solid gray: best ACDM-based parametric fit; dashed gray: power spectrum for an 
HACDM model. See Table [T] for details of model-based power spectra. 
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Fig. 4. — Nonparametric fits for WMAP 5- (top) and 7-year (bottom) data sets, x- and y-ranges are 
identical across plots. Green: full-freedom monotone fit (EDoF ~ 60.4,102.9 respectively); blue: 
NSS fit (EDoF = 13,20 respectively); red: restricted-freedom monotone fit (EDoF ps 14.4,14.1 
respectively); solid gray: best ACDM-based parametric fit; dashed gray: power spectrum for an 
HACDM model. See Table [T] for details of model-based power spectra. 
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Fig. 5. — The results of a probe of the confidence sets for the WMAP 1- (black), 3- (blue), 5- (red), 
and 7- year (green) nonparametric restricted- freedom monotone fits (Fig. Island [4]) to determine how 
well the angular power spectrum is determined by the data alone. The quantity plotted for each 
data realization is the total vertical variation at each / within the respective 95% {2a) confidence 
set, divided by the absolute value of the fit. This quantity is an approximate measure of how well 
the angular power spectrum is determined by the data: Values <C 1 indicate that the fit is tightly 
determined by the data, whereas values > 1 indicate that the data contain little or no information 
about the height of the angular power spectrum for that /. Disregarding the low-/ region, the 
color-coded vertical lines indicate the approximate l-value at which each curve rises above 1. 
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Fig. 6. — Nonparametric uncertainties on peak and dip locations and heights for the WMAP 1- 
(top left), 3- (top right), 5- (bottom left), and 7-year (bottom right) data sets. Nonparametric 
fit displayed for reference is the restricted-freedom monotone fit in Fig. [3] and |4j The number of 
acceptable function variations sampled from the confidence set for each data realization is 5000. 
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Fig. 7. — Confidence "bands" for the acoustic scale Ia and the shift (f)m for the mth peak, as 
derived from the 95% confidence intervals on the first three peak locations (Table pi) and Eq. 19 
Blue: (/>!, red: 02, green: (j)^. Top left: WMAP 1-year, top right: 3-year, bottom left: 5-year, and 
bottom right: 7-year data sets. Note that these {lA,4'm) bands for different peaks m appear to 
overlap around 0^ ~ and 200 ^ Ia 'S 400: We interpret this as a nonparametric revelation of the 
nearly harmonic structure of peaks in the CMB power spectrum. 
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Table 1: Distances of two model-based power spectra from our nonparametric fits 



Data 


ACDM 


HACDM 


1-year 

Ta = 0.3679 


0.1540 

16.70% 


0.1493 

15.87% 


3-year 

r„ = 0.3653 


0.1462 

14.52% 


0.2057 
29.03% 


5-year 

r„ = 0.3563 


0.1419 
19.66% 


0.3552 

94.82% 


7-year 

Ta = 0.3551 


0.1238 

9.08% 


0.3550 

94.98% 





Note. — These are the distances of two model-based (ACDM and HACDM) power spectra from our nonparametric 
(full-freedom monotone) fits. The HACDM model ( [Primack et al.|fl995||Primack fc Gro ss 2001) considered here for 
illustrative purposes is defined by a small neutrino fraction {Q,^h^ = 0.00275) with corresponding adjustment to the 
dark energy content (JIa = 0.729756), and the rest of the parameters (including zero curvature) being identical to 
that of the best ACDM model ( Larson et al.|2011 \ for the 7- year data (power spectrum generated using the CAMB 



software (Lewis et al.|2000|). ACDM-based fits used are the best parametric fits for the corresponding data realization 



Hinshaw et al.|2003b[|2007||Nolta et al.|2009| [Larson et al.|2011[ ). r^ is the confidence radius at a = 0.05 (i.e., 95% 



confidence level = 2a). Percentages reported are the confidence levels corresponding to these distances; these can 
be interpreted as asymptotic probabilities with which the corresponding parametric fit is ruled out as a candidate 
for the true but unknown spectrum. Notice the dramatic progression, as the data become precise, of (a) how this 
HACDM model is pushed to the boundary of the confidence set, and (b) how the ACDM model gets closer to the 
nonparametric fit. 
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